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Abstract. The paper discusses the problem of determinising finite-state automata contain- 
ing large numbers of e-moves. Experiments with finite-state approximations of natural 
language grammars often give rise to very large automata with a very large number of 
e-moves. The paper identifies three subset construction algorithms which treat e-moves. 
A number of experiments has been performed which indicate that the algorithms differ 
considerably in practice. Furthermore, the experiments suggest that the average number 
of e-moves per state can be used to predict which algorithm is likely to perform best for a 
given input automaton. 



1 Introduction 

In experimenting with finite-state approximation techniques for context-free and more pow- 
erful grammatical formalisms (such as the techniques presented in ?), ?), ?)) we have found 
that the resulting automata often are extremely large. Moreover, the automata contain many 
e-moves (jumps). And finally, if such automata are determinised then the resulting automata 
are often smaller. It turns out that a straightforward implementation of the subset construction 
determinisation algorithm performs badly for such inputs. 

As a motivating example, consider the definite-clause grammar that has been developed 
for the OVIS2 Spoken Dialogue System. This grammar is described in detail in ?). After re- 
moving the feature constraints of this grammar, and after the removal of the sub-grammar for 
temporal expressions, this context-free skeleton grammar was input to an implementation of 
the technique described in ?). 1 The resulting non-deterministic automaton (labelled zovis2 be- 
low) contains 89832 states, 80935 e-moves, and 80400 transitions. The determinised automaton 
contains only 6541 states, and 60781 transitions. Finally, the minimal automaton contains only 
78 states and 526 transitions! Other grammars give rise to similar numbers. Thus, the approx- 
imation techniques yield particularly 'verbose' automata for relatively simple languages. 

The experiments were performed using the FSA Utilities toolkit (?). At the time, an old 
version of the toolkit was used, which ran into memory problems for some of these automata. 
For this reason, the subset construction algorithm has been re-implemented, paying special 
attention to the treatment of e-moves. Three variants of the subset construction algorithm are 
identified which differ in the way e-moves are treated: 



per graph The most obvious and straightforward approach is sequential in the following 
sense. Firstly, an equivalent automaton without e-moves is constructed for the input. In 
order to do this, the transitive closure of the graph consisting of all e-moves is computed. 

1 A later implementation by Nederhof (p.c.) avoids construction of the complete non-determistic 
automaton by determinising and minimising subautomata before they are embedded into larger 
subautomata. 
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Secondly, the resulting automaton is then treated by a subset construction algorithm for 
e-free automata. 

per state For each state which occurs in a subset produced during subset construction, com- 
pute the states which are reachable using e-moves. The results of this computation can 
be memorised, or computed for each state in a preprocessing step. This is the approach 
mentioned briefly in ?). 2 

per subset For each subset Q of states which arises during subset construction, compute Q' D 
Q which extends Q with all states which are reachable from any member of Q using e- 
moves. Such an algorithm is described in ?). We extend this algorithm by memorising the 
e-closure computation. 

The motivation for this paper is the experience that the first approach turns out to be 
impractical for automata with very large numbers of e-moves. An integration of the subset 
construction algorithm with the computation of e-reachable states performs much better in 
practice. The per subset algorithm almost always performs better than the per state approach. 
However, for automata with a low number of jumps, the per graph algorithm outperforms the 
others. 

In constructing an e-free automaton the number of transitions increases. Given the fact that 
the input automaton already is extremely large (compared to the simplicity of the language it 
defines), this is an undesirable situation. An equivalent e-free automaton for the example given 
above results in an automaton with 2353781 transitions. The implementation of per subset is the 
only variant which succeeds in determinising the input automaton of this example. 

In the following section some background information concerning the FSA Utilities tool- 
box is provided. Section 3 then presents a short statement of the problem (determinise a given 
finite-state automaton), and a subset construction algorithm which solves this problem in the 
absence of e-moves. Section 4 identifies three variants of the subset construction algorithm 
which take e-moves into account. Finally, section 5 discusses some experiments in order to 
compare the three variants both on randomly generated automata and on automata generated 
by approximation algorithms. 

2 FSA Utilities 

The FSA Utilities tool-box is a collection of tools to manipulate regular expressions, finite- 
state automata and finite-state transducers (both string-to-string and string-to-weight trans- 
ducers). Manipulations include determinisation (both for finite-state acceptors and finite-state 
transducers), minimisation, composition, complementation, intersection, Kleene closure, etc. 
Various visualisation tools are available to browse finite-state automata. The tool-box is imple- 
mented in SICStus Prolog. 

The motivation for the FSA Utilities tool-box has been the rapidly growing interest for 
finite-state techniques in computational linguistics. The FSA Utilities tool-box has been devel- 
oped to experiment with these techniques. The tool-box is available free of charge under Gnu 
General Public License. 3 The following provides an overview of the functionality of the tool- 
box. 

2 According to Derick Wood (p.c), this approach has been implemented in several systems, including 
Howard Johnson's INR system. 

3 See http://www.let.rug.n1/%7Evannoord/Fsa/.The automata used in the experiments are 
available from the same site. 



- Construction of finite automata on the basis of regular expressions. Regular expression 
operators include concatenation, Kleene closure, union and option (the standard regular 
expression operators). Furthermore the extended regular expression operators are pro- 
vided: complement, difference and intersection. Symbols can be intervals of symbols, or 
the 'Any'-variable which matches any symbol. Regular expression operators are provided 
for operations on the underlying automaton, including minimisation and determinisation. 
Finally, we support user-defined regular expression operators. 

- We also provide operators for transductions such as composition, cross-product, same- 
length-cross-product, domain, range, identity and inversion. 

- Determinisation and Minimisation. Three different minimisation algorithms are sup- 
ported: Hopcroft's algorithm (?), Hopcroft and Ullman's algorithm (?), and Brzozowski's 
algorithm (?). 

- Determinisation and minimisation of string-to-string and string-to-weight transducers (?; 
?). 

- Visualisation. Support includes built-in visualisation (Tcl/Tk, LaTeX, Postscript) and inter- 
faces to third party graph visualisation software (Graphviz (dot), VCG, daVinci). 

- Random generation of finite automata (an extension of the algorithm in ?) to allow the 
generation of finite automata containing e-moves). 

3 Subset Construction 
3.1 Problem statement 

Let a finite-state machine M be specified by a tuple (Q, E, 8, S, F) where Q is a finite set of 
states, E is a finite alphabet, 5 is a function from Q x (E U {e}) ^ 2 Q . Furthermore, S C Q is a 
set of start states 4 and F C Q is a set of final states. 

Let e-move be the relation qj)\qj £ 8{q l , e)}. e-reachable is the reflexive and transitive 
closure of e-move. Let e-CLOSURE: 2 Q — > 2® be a function which is defined as: 

e-CLOSURE(Q') = {q\q' e Q', (q',q) G e-reachable} 

For any given finite-state automaton M = (Q, E, S, S, F) there is an equivalent determinis- 
tic automaton M' = (2 Q , E, 5', {Q }, F'). F' is the set of all states in 2 Q containing a final state 
of M, i.e., the set of subsets {Qi € 2 Q \q e Qu q 6 F}. M' has a single start state Q which is 
the epsilon closure of the start states of M, i.e., Q = e-CLOSURE(5'). Finally, 

6'({q u q 2 , q t }, a) = e-CLOSURE (S {q x , a) U S(q 2 , a) U . . . U S(q u a)) 

An algorithm which computes M' for a given M will only need to take into account states 
in 2*2 which are reachable from the start state Qo- This is the reason that for many input au- 
tomata the algorithm does not need to treat all subsets of states (but note that there are au- 
tomata for which all subsets are relevant, and hence exponential behaviour cannot be avoided 
in general). 

Consider the subset construction algorithm in figure 1. The algorithm maintains a set of 
subsets States. Each subset can be either marked or unmarked (to indicate whether the subset 

4 Note that a set of start states is required, rather than a single start state. Many operations on automata 
can be defined somewhat more elegantly in this way. Obviously, for deterministic automata this set 
should be a singleton set. 



funct subseLconstruction((Q, S, 8, S, F)) 

index JransitionsQ ; Trans := 0; Finals := 0; States := 0; 

Start —: epsilon_closure(S) 

add(Start) 

while there is an unmarked subset T G States do 
mark(F) 

foreach (a, U) G instructions (T) do 

17 := epsilon_closure(U) 

Trans[F,a] := {[/} 

add ((7) 
od 

od 

return (States, X 1 , Trans, {Start}, Finals) 
end 

proc add ((7) Reachable-state-set Maintenance 

if U <£ States 
then 

add U unmarked to States 
UUC\F then Finals := Finals UUg 

fi 
end 



funct instructions (P) 

return merge ([J eP transitions (p)) 
end 

funct epsilon_closure(U) 

return £/ 
end 



Instruction Computation 



variant 1: No e-moves 



Figure 1. Subset-construction algorithm. 



has been treated by the algorithm); the set of unmarked subsets is sometimes referred to as 
the agenda. The algorithm takes such an unmarked subset T and computes all transitions 
leaving T. This computation is performed by the function instructions and is called instruction 
computation by ?). 

The function index_transitions constructs the function transitions : Q — > S x 2 Q . This func- 
tion returns for a given state p the set of pairs (s, T) representing the transitions leaving p. 
Furthermore, the function merge takes such a set of pairs and merges all pairs with the same 
first element (by taking the union of the corresponding second elements). For example: 



merge({(a, {1, 2, 4}), (6, {2, 4}), (a, {3, 4}), (6, {5, 6})}) = {(a, {1, 2, 3, 4}), (6, {2, 4, 5, 6})} 

The procedure add is responsible for 'reachable-state-set maintenance', by ensuring that 
target subsets are added to the set of subsets if these subsets were not encountered before. 
Moreover, if such a new subset contains a final state, then this subset is added to the set of 
final states. 



4 Three Variants for e-Moves 



The algorithm presented in the previous section does not treat e-moves. In this section three 
possible extensions of the algorithm are identified to treat e-moves. 

4.1 Per graph 

This variant can be seen as a straightforward implementation of the constructive proof that 
for any given automaton with e-moves there is an equivalent one without e-moves (?)[page 
26-27]. 

For a given M = (Q, S, 8, S, F) this variant first computes M' = (Q, S, 8', S', F), where 
S' = e-CLOSURE^), and 5'(q,a) = e-CLOSURE(%, a)). The function e-CLOSURE is com- 
puted by using a standard transitive closure algorithm for directed graphs: this algorithm is 
applied to the directed graph consisting of all e-moves of M. Such an algorithm can be found 
in several textbooks (see, for instance, ?)). 

The advantage of this approach is that the subset construction algorithm does not need 
to be modified at all. Moreover, the transitive closure algorithm is fired only once (for the 
full graph), whereas the following two variants call a specialised transitive closure algorithm 
possibly many times. 

4.2 Per subset and per state 

The per subset and the per state algorithm use a variant of the transitive closure algorithm for 
graphs. Instead of computing the transitive closure of a given graph, this algorithm only com- 
putes the closure for a given set of states. Such an algorithm is given in figure2. 



funct closure(T) 
D =: 

foreach t e T do add t unmarked to D od 
while there is an unmarked state t e D do 
mark (t) 

foreach q e 8(t, e) do 

if q ^ D then add q unmarked to D fi 
od 

od 

return D 
end 

Figure 2. Epsilon-closure Algorithm 

In either of the two integrated approaches, the subset construction algorithm is initialised 
with an agenda containing a single subset which is the e-CLOSURE of the set of start-states of 
the input; furthermore, the way in which new transitions are computed also takes the effect 
of e-moves into account. Both differences are accounted for by an alternative definition of the 
epsilon.closure function. 



The approach in which the transitive closure is computed for one state at a time is defined 
by the following definition of the epsilon.closure function. Note that we make sure that the 
transitive closure computation is only performed once for each input state, by memorising the 
closure function. 



funct epsilon_closure(U) variant 2: per state 

return {J ueU memo (closure ({u})) 
end 

In the case of the per subset approach the closure algorithm is applied to each subset. We also 
memorise the closure function, in order to ensure that the closure computation is performed 
only once for each subset. This can be useful since the same subset can be generated many 
times during subset construction. The definition simply is: 

funct epsilon_closure(U) variant 3: per subset 

return memo (closure (U)) 
end 

The motivation for per state approach may be the insight that in this case the closure algo- 
rithm is called at most \Q\ times. In contrast, in the per subset approach the transitive closure 
algorithm may need to be called 2>®> times. On the other hand, in the per state approach some 
overhead must be accepted for computing the union of the results for each state. Moreover, in 
practice the number of subsets is often much smaller than 2^L In some cases, the number of 
reachable subsets is smaller than the number of states encountered in those subsets. 



5 Experiments 

Two sets of experiments have been performed. In the first set of experiments a number of 
random automata is generated according to a number of criteria (based on ?)) . In the second 
set of experiments, results are provided for a number of (much larger) automata that surfaced 
during actual development work on finite-state approximation techniques. 

Random automata. Firstly, consider a number of experiments for randomly generated au- 
tomata. Following ?), the absolute transition density of an automaton is defined as the number 
of transitions divided by the square of the number of states times the number of symbols (i.e. 
the number of transitions divided by the number of possible transitions). Deterministic transi- 
tion density is the number of transitions divided by the number of states times the number of 
symbols (i.e. the ratio of the number of transitions and the number of possible transitions in 
a deterministic machine). ?) shows that deterministic transition density is a reliable measure for 
the difficulty of subset construction. Exponential blow-up can be expected for input automata 
with deterministic transition density of around 2. 5 

A number of automata were generated randomly, according to the number of states, sym- 
bols, and transition density. The random generator makes sure that all states are reachable 
from the start state. For the first experiment, a number of automata was randomly generated, 
consisting of 15 symbols, and 15, 20, 25, 100 or 1000 states, using various densities (and no 

5 Leslie uses the terms absolute density and deterministic density. 



e-moves). The results are summarised in figure 3. Only a single result is given since each of the 
implementations works equally well in the absence of e-moves. 6 

A new concept called absolute jump density is introduced to specify the number of e-moves. 
It is defined as the number of e-moves divided by the square of the number of states (i.e., the 
probability that an e-move exists for a given pair of states). Furthermore, deterministic jump 
density is the number of e-moves divided by the number of states (i.e., the average number 
of e-moves which leave a given state). In order to measure the differences between the three 
implementations, a number of automata has been generated consisting of 15 states and 15 sym- 
bols, using various transition densities between 0.01 and 0.3 (for larger densities the automata 
tend to collapse to an automaton for S*). For each of these transition densities, jump densi- 
ties were chosen in the range 0.01 to 0.24 (again, for larger values the automaton collapses). 
In figure 4 the outcomes of this experiment are summarised by listing the average amount of 
CPU-time required per deterministic jump density (for each of the three algorithms). Thus, 
every dot represents the average for determinising a number of different input automata with 
various absolute transition densities and the same deterministic jump density. The figures 5, 6 
and 7 summarise similar experiments using input automata with 20, 25 and 100 states. 7 

The striking aspect of these experiments is that the per graph algorithm is more efficient for 
lower deterministic jump densities, whereas, if the deterministic jump density gets larger, the 
per subset algorithm is more efficient. The turning point is around a deterministic jump density 
between 1 and 1.5, where it seems that for larger automata the turning point occurs at a lower 
determinisic jump density. Interestingly, this generalisation is supported by the experiments 
on automata which were generated by approximation techniques (although the results for 
randomly generated automata are more consistent than the results for 'real' examples). 

Experiment: Automata generated by approximation algorithms The automata used in the previ- 
ous experiments were randomly generated, according to a number of criteria. However, it 
may well be that in practice the automata that are to be treated by the algorithm have typical 
properties which were not reflected in this test data. For this reason results are presented for 
a number of automata that were generated using approximation techniques for context-free 
grammars (?; ?; ?). In particular, a number of automata has been used generated by Mark-Jan 
Nederhof using the technique described in ?). In addition, a small number of automata have 
been used which were generated using the technique of ?) (as implemented by Nederhof). 

The automata typically contain lots of jumps. Moreover, the number of states of the result- 
ing automaton is often smaller than the number of states in the input automaton. Results are 
given in table 1. One of the most striking examples is the ygrim automaton consisting of 3382 
states and 10571 jumps. For this example, the per graph implementation ran out of memory 

6 CPU-time was measured on a HP 9000/780 machine running HP-UX 10.20, 240Mb, with SICStus Pro- 
log 3 #3. For comparison with an "industrial strength" implementation, we have applied the deter- 
miniser of AT&T's FSM utilities for the same examples. The results show that for automata with very 
small transition densities FSM is faster (up to 2 or 3 times as fast), but for automata with larger den- 
sities the results are very similar, in some cases our Prolog implementation is even faster. Note finally 
that our timings do include IO, but not the start-up of the Prolog engine. 

7 We also provide the results for FSM again; we used the pipe f smrmepsilon | f smdeterminize 
. According to Fernando Pereira (pc) the comparison is less meaningful in this case because the fsm- 
rmepsilon program treats weighted automata. This generalisation requires some overhead also in case 
no weights are used (for the determiniser this generalisation does not lead to any significant overhead). 
Pereira mentions furthermore that FSM used to include a determiniser with integrated treatment of 
jumps. Because this version could not (easily) be generalised for weighted automata it was dropped 
from the tool-set. 
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Figure 3. Deterministic transition density versus CPU-time in msec. The input automata have no 
e-moves. 
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Figure 4. Average amount of CPU-time versus jump density for each of the three algorithms, and FSM. 
Input automata have 15 states. Absolute transition densities: 0.01-0.3. 
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Figure 5. Average amount of CPU-time versus jump density for each of the three algorithms, and FSM. 
Input automata have 20 states. Absolute transition densities: 0.01-0.3. 
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Figure 6. Average amount of CPU-time versus deterministic jump density for each of the three algo- 
rithms, and FSM. Input automata have 25 states. Absolute transition densities: 0.01-0.3. 
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Figure 7. Average amount of CPU-time versus deterministic jump density for each of the three algo- 
rithms, and FSM. Input automata have 100 states. Absolute transition densities: 0.001-0.0035. 



(after a long time), whereas the per subset algorithm produced the determinised automaton 
relatively quickly. The FSM implementation took much longer for this example (whereas for 
many of the other examples it performs better than our implementations). Note that this ex- 
ample has the highest number of jumps per number of states ratio. 
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Table 1. Results for automata generated by approximation algorithms. The dashes in the table indicate 
that the corresponding algorithm ran out of memory (after a long period of time) for that particular 
example. 



6 Conclusion 



We have discussed three variants of the subset-construction algorithm for determinising finite 
automata. The experiments support the following conclusions: 

- the per graph variant works best for automata with a limited number of jumps 

- the per subset variant works best for automata with a large number of jumps 

- the per state variant almost never outperforms both of the two other variants 

- typically if the deterministic jump density of the input is less than 1, then the per graph 
variant outperforms the per subset variant. If this value is larger than 1.5, then per subset 
outperforms per graph. 

- the per subset approach is especially useful for automata generated by finite-state approxi- 
mation techniques, because those techniques often yield automata with very large number 
of e-moves. 
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